Accurately Predicting Transcription Start Sites Using Logitlinear Model and Local Oligonucleotide Frequencies

نویسندگان

  • Jia Wang
  • Chuang Ma
  • Dao Zhou
  • Libin Zhang
  • Yanhong Zhou
چکیده

In this study, we construct a transcription start site (TSS) prediction model using the logitlinear model and the genomic context features mined in promoter regions. We also develop a computational program named ProKey that is able to accurately predict TSSs in long DNA sequences. Performance evaluation results on the whole human genome show that ProKey could achieve 71.2% sensitivity and 76.3% specificity at the resolution level of 2000bp. Further comparison results exhibit that the correlation coefficient (CC) value of ProKey is higher than that of DragonGSF and Eponine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RBF-TSS: Identification of Transcription Start Site in Human Using Radial Basis Functions Network and Oligonucleotide Positional Frequencies

Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification meth...

متن کامل

CisMapper: predicting regulatory interactions from transcription factor ChIP-seq data

Identifying the genomic regions and regulatory factors that control the transcription of genes is an important, unsolved problem. The current method of choice predicts transcription factor (TF) binding sites using chromatin immunoprecipitation followed by sequencing (ChIP-seq), and then links the binding sites to putative target genes solely on the basis of the genomic distance between them. Ev...

متن کامل

Inferring Biological Meaning from Cap Analysis Gene Expression Data

This project is inspired by the recent development of the Cap analysis gene expression (CAGE) method, which introduces new benefits to standard gene expression techniques such as being higher-throughput and more accurately mapping the transcription start sites.7 Building off of previous serial analysis of gene expression (SAGE) methods, I aim to extract biological meaning from the gene expressi...

متن کامل

Analysis of consensus sequence patterns in Giardia cytoskeleton gene promoters.

Protein-coding genes in the ancient eukaryote Giardia lamblia lack typical promoter consensus elements. We have analysed the immediate 5' flanking sequences of seven genes of related function (structural cytoskeleton proteins) to identify shared DNA motifs that might have a role in transcription initiation. Transcription start sites for five genes have been determined previously. Genomic mappin...

متن کامل

Improved prediction of bacterial transcription start sites

MOTIVATION Identifying bacterial promoters is an important step towards understanding gene regulation. In this paper, we address the problem of predicting the location of promoters and their transcription start sites (TSSs) in Escherichia coli. The accepted method for this problem is to use position weight matrices (PWMs), which define conserved motifs at the sigma-factor binding site. However ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011